This course is part of a series of courses for the Informatics Technology for Cancer Research (ITCR) called the Informatics Technology for Cancer Research Education Resource. This material was created by the ITCR Training Network (ITN) which is a collaborative effort of researchers around the United States to support cancer informatics and data science training through resources, technology, and events. Our courses feature tools developed by ITCR Investigators and make it easier for principal investigators, scientists, and analysts to integrate cancer informatics into their workflows. Please see our website at https://www.itcrtraining.org/ for more information.
Cancer datasets are plentiful, complicated, and hold information that may be critical for the next research advancements. In order to use these data to their full potential, researchers are dependent on the specialized data tools that are continually being published and developed. Bioinformatics tools can often be unfriendly to their users, who often have little to no background in programming (Bolchini et al. 2008). The usability and quality of the documentation of a tool can be a major factor in how efficiently a researcher is able to obtain useful findings for the next steps of their research.
Increasing the usability and quality of documentation for a tool is not only helpful for the researcher users, but also for the developers themselves – the many hours of work put into the product will have a higher impact if the tool is usable by the target user community. 70% of bioinformatics tools surveyed by Duck et al. (2016) were not reused beyond their introductory publication. Even the most well-programmed tool will be overlooked by the user community if there is little to no user-friendly documentation or if they were not designed with the user in mind.
The course is intended for cancer informatics tool developers, particularly those creating tools as a part of the Informatics Technology Cancer Research.
The course includes a hands-on exercises with templates for building documentation and tutorials for cancer informatics tools. Individuals who take this course are encouraged to use these templates as they follow along with the course material to help increase the usability of their informatics tool.
Tool development is an exciting but long process – filled with lots of careful programming, tedious troubleshooting, but also ‘Aha’ moments that ultimately can result in an amazing product that you should be proud of!
Tina the Tool developer, perhaps like you, has just gotten her product working well and many of the bugs have been sorted out. Tina’s awesome tool is working exactly as designed and Tina is excited to get her tool out there to be used by the community!
This is indeed cause for celebration! Perhaps researchers like Uri the Tool User will come across Tina’s awesome tool and share in Tina’s enthusiasm for the project! Tina’s bioinformatics tool may be just what they were needing for their research project!
Uri the Tool User can’t wait to apply Tina’s awesome tool to their project! But, it may not be long before Uri encounters errors, or questions about Tina’s awesome tool, no matter how high quality Tina’s programming of the tool is.
Often users like Uri, particularly in the biology and cancer fields, have little to no programming experience. Even if a user does have programming experience, they are still unfamiliar with how Tina has set up tool. The tool may even be working exactly according to Tina’s vision but if users like Uri do not understand Tina’s vision or basic programming principles that Tina might take for granted, it can lead to a lot of frustration and time inefficiently spent.
If the tool’s documentation is non-existent, scarce, out-of-date, or filled with too much jargon, the chances that Uri will be able to successfully and efficiently create a product with the tool is drastically diminished.
Lack of usability often leads users to ditch even the most well-programmed of tools.
This is the unfortunate and all-too-common result of many bioinformatics tools.
The lack of focus and education on usability in the bioinformatics tool development is not only a disservice to progress of cancer research, but also to the tool developers themselves who have equally spent uncountable work hours and effort on the development of cancer tools.
We know that bioinformatics tool development doesn’t occur in a vaccuum. User designers in the field of bioinformatics have commented on reasons why documentation and usability sometimes suffer for bioinformatics tools:
Unfortunately this specific course cannot address issues 1 and 2, but will attempt to help with problem 3.
We realize many tool developers are averse to documentation like some are to spinach, feeling unenthused about the process of creating documentation. The documentation process requires a different skill set from the tool development itself; meaning many developers were likely not attracted to tool development because of documentation and may not be sure how to craft good documentation (Wolf 2016). They may know its good for their tool, but they just aren’t enthused about it. In this course, we’d like you to view your tool as Popeye and where documentation spinach will make your tool stronger!
The effort for creating documentation has a high return payoff for the continued success of a tool as a whole!
Returning to our cast of characters, let’s say that Tina the Tool Developer, had the time and knowledge to create awesome documentation for her tool.
Uri the tool User is still likely to encounter errors and problems, but with thorough and easy-to-digest documentation, Uri is better equipped to troubleshoot these problems! They may also learn more about the features and limitations of the tool that will better guide Uri’s next steps!
Being equipped with user-centered documentation, Uri is more likely to be able to reach the next steps of their research and potentially share a publishable result! Tina’s tool is now more likely to be cited in publications, or other forms of media.
This rewards Uri for having used Tina’s tool, making Uri not only likely to continue to use the tool for their next projects, but Uri may also help spread the word about how great their experience with Tina’s tool was.
This means that Tina may have a larger user base for her tool and will help Tina with future funding opportunities and making connections that will help her create more awesome tools!
Well-documented tools help developers better maintain their code in the future because they may forget the mechanics of their tool over time. If Future Tina has to divert her time and effort to another project but then returns to do tool maintenance, documentation may help jog her memory!
Thorough and easy-to-digest documentation may also help other tool developers contribute features or fix bugs in Tina’s tool. Here Colin the Contributor was able to read Tina’s awesome documentation. It not only got him excited about the tool, but allowed him to program a new feature which he sent to Tina.
Now that you are hopefully energized and ready for creating documentation for your tool, let’s talk about a bit user design concepts!
Creating tools that are easy to use starts with thinking about your user’s perspective. In other words, user design is an exercise in applied empathy (Matos et al. 2013).
This is why a common saying in user design is “You are not your user”(Alexakis, n.d.). Although it may be true that you may have a lot in common with your user, this saying is based in the idea that you should not assume your user knows what you know or thinks like you do. For example, a warning message that may seem perfectly clear to you as a developer, may be a foreign language to your user.
As compared to yourself, your typical user may likely have a different:
And most importantly your user does not know your tool like you do! You have spent many, many hours developing this tool and its unrealistic and impractical for them to spend the same number of hours with your tool that you have.
Also keep in mind users are humans in a context. Humans have demands in their life distracting them, or are otherwise been working a long day, and are tired/frustrated/distracted/etc. Making your tool as easy as possible to use increases the likelihood of your user continuing to stick with your tool and even becoming an advocate for your tool to their colleagues!
On a general level, there are some characteristics we know about bioinformatic tool user communities.
The typical user of bioinformatics tools are generally:
However, the bioinformatics user community also includes a variety of individuals with different roles and experiences. Mulder et al. (2018) described 10 user personas for bioinformatics software, all with their own skills and competencies:
Additionally, users may be at various stages in their education (undergraduates, graduates, postdocs, etc) and may have varying experience and time constraints.
Write down what you know (or think you know) and try to identify any knowledge gaps you might have about your user community.
Keep the questions about your user community in mind and in a later chapter, we’ll go into more detail about conducting user research to address any knowledge gaps you may have about your user community.
While finding out about your user community is critical, there’s also principles we can discuss that are common to all users/humans.
Humans are drawn to intuitive visuals. Visuals are efficient means of communication and help users absorb information better than long-winded paragraphs (though visuals need an accompanying explanation too).
Sometimes this is particularly helpful for complicated concepts. For example, BEDtools (Quinlan and Hall (2010)) allows for the manipulation of genomic sequences in BED files. Some of these principles can be complicated to visualize, but the authors of BEDtools do a great job of using visuals to explain each function:
Here, this figure explains how the merge function works given a particular set of ranges.
What someone considers jargon is a very relative to their own experiences.
Terms that may seem like common knowledge to you may be foreign to your user.
For example, something seemingly commonplace to you like TSV may not be something a user does not understands what it means. To help smooth over jargon-related barriers, spell out abbreviations the first time you mention them (e.g. ‘tab separated values (TSV)’). This doesn’t mean you have to have long winded explanations of every term, instead you can quickly link out to an article or website with information about a term you’ve used (e.g. tab separated values file (TSV)). This has the benefit of saving you and your user’s time without making your explanations too long winded.
No matter how much you have perfected your tool, it will never be perfect, especially since software deprecates over time. Because of this or other unknown unknowns, a usable tool still has a way for a user to let the developers know when something isn’t working. This might a direct and obvious break like a bug or broken link, but it could also be something more subtle that also requires your attention. We’ll discuss this in more detail in a later chapter.
Now that we’ve discussed some major principles about users and design, let’s dive in to talking about how documentation can help!
In this chapter we are going to cover the major aspects of a well-documented tool. In the subsequent chapters, we will talk about each of these components in more detail; providing relevant examples and tools.
Before we get into the technical information in your documentation, the first thing that should be obvious to your user is why they should want to use your tool! What need of your users does your tool fulfill? If this is not glaringly obvious, users will move on without realizing how valuable your tool could be for their research!
This should be the first thing your user sees on the main page of your tool. If it currently is not clear, take this time to workshop one or two sentences that explain the ‘why’ of your tool. As you craft this sentence, think about the needs of your user and how to summarize your tool’s purpose in a brief, punchy way. Stay away from jargon unless perhaps its jargon that you know your user will understand.
Examples of tools with their why’s well-stated prominently on their web page:
Gene Set Enrichment Analysis (GSEA) is a computational method that determines whether an a priori defined set of genes shows statistically significant, concordant differences between two biological states (e.g. phenotypes).
Here, GSEA tells us the exact kind of question we can ask and what input is needed.
Salmon is a tool for wicked-fast transcript quantification from RNA-seq data. It requires a set of target transcripts (either from a reference or de-novo assembly) to quantify.
Salmon tells us its for transcript quantification but that its particular strength is its “wicked-fast.”
A getting started section will be a new user’s first introduction to your tool. It will tell them specific steps they need to “get started” (hence why we call it this) – no long-winded explanations, just the quickest way to reach some sort of output.
In the most clear steps as possible, a getting started section will tell the user how to:
It’s crucial that the steps here are simple and easy to follow or you risk losing new users before they even get going.
A set of how-to examples are like a cookbook of recipes that will demonstrate step-by-step the most common uses for your tool after they have completed the getting started steps.
How-to examples give users next steps to get further comfortable with your tool and shows them what a analysis might look like.
A reference guide is like a dictionary that a user can look up items as questions arise. A user will likely have a specific question about a function, parameter, data type, or option and will want to be able to navigate to information about that item. Users who end up looking through your reference guide are likely a bit invested and may have already gone through the getting started section and/or how-to examples.
Comments in your source code are also a part of documentation – and likely the first part of the documentation you worked on! In most instances, if your tools is functioning fine, code will probably only be looked at by advanced and/or the most invested users. But also recall that documentation is not only for your user but for future you and existing or potential collaborators.
Try as you might, you will not be able to predict every scenario that a user may engage with your tool. Your user may encounter errors or quirks that you did not foresee but that would be helpful for you to know about. Your documentation should direct your users where they should send any comments or concerns. You should also make sure that this feedback method is something you can check up on and respond to regularly. We’ll also discuss how to conduct usability research to get the most informative feedback on your tool.
We encourage you to create these pieces of documentation we will further discuss in the rest of this course! We have template documentation you can use to use as a starting point.
TODO: Add links to those supp chapters.
If you are creating documentation to accompany a package you are submitting to Bioconductor or Galaxy, we also have more specific templates and recommendations for those instances.
This document is a checklist that summarizes the major aspects that should be included in a tools’ documentation. We recommend using it to evaluate the documentation for an existing tool and identify any gaps you may need to address, or as a to-do list for creating new documentation that you can check off as you follow along with this course.
There are two options we suggest for creating documentation as you follow along with this course.
Option 1) Use these template markdowns essentially as they are (after you fill them in) and add them to an existing repository.
Pros: Is easier and quicker.
Cons: Is not as user-friendly as option 2.
Option 2) Clone a repository with these templates and set up a MkDocs github pages site. Slightly more work, but a very nice end result; see demo here.
Pros: This format of documentation sites are easy to navigate and likely familiar to your user.
Cons: Will require you to use mkdocs package to get this set up.
Here’s what’s in that folder:
templates/
├── well_documented_checklist.md
├── getting_started_template.md
├── how_to_examples.md
├── reference_guide_template.md
├── bioconductor-guides/
│ ├── bioconductor_example_script.R
│ ├── bioconductor_vignette_template.Rmd
│ └── README.md
└── galaxy-guides/
Use this Template to get started.On your computer’s command line:
git clone your new repository you made from our template.mkdocs serve to see a preview of your docsdocs/ folder.mkdocs build and then mkdocs serve to see a preview.mkdocs gh-deploy, it will return the web address of your new site – go to that address and bask in the success of your newly made documentation!Now that we have a basic structure and plan for our documentation, let’s discuss each each section of this documentation in more detail!
A getting started section is new users’ first experience with your tool. It is also can be the most frustrating experience for your user if installation doesn’t happen smoothly.
Our goal for our getting started section is to guide our new users through installation steps as quickly and smoothly as possible then send them to a brief tutorial to show off the awesomeness of your tool!
It’s hard to get started if you don’t know where to go to begin with! Your getting started section should be the easiest page to find. Have your link to your getting started page prominently feature on your landing page and if you have a navigation bar.
Provide users with the introductory concepts of the tool; briefly expand a bit more on that Why that they already saw.
If your users’ needs fit your description, this will fuel them with the motivation to get through the first big hurdle: installation.
Before getting to install steps, a special consideration: Does your tool have multiple ways to run it, for example can it be ran either through a GUI or command line? Describe this to your users so they get shuttled to the method of running your tool that is right for them.
Installation is the first and perhaps biggest hurdle your user will encounter with your tool. The clearer and more specific these steps the better. Mangul, Mosqueiro, et al. (2019) found that tools that required more installation steps (but didn’t describe these steps adequately) were less likely to be installed successfully, and tools that were less likely to be installed successfully had significantly less citations!
If installation happens through command line, provide copy-and-paste commands that your user can use as-is. In these commands, if parts of it need to be tailored, call attention where the tailoring needs to happen and how your user can determine what they need to put there. Fill-in-the-blank cues can be handy for these scenarios.
Tell your users what to expect. Do some steps take more time than others? Warn them about that. Are there output prompts that may not be intuitive but are to be expected? For example, sometimes a regular red text installation message may indicate things are working fine, but if a user doesn’t know what the text means, sometimes they will try to interpret red text as meaning something bad has occurred.
Where it makes sense, you use screenshots as assurances to the user that they are on the right track. Being able to see that your users’ screen matches what is shown in your screenshots reassures them that things are progressing correctly. Conversely, if something does not match, it can help them narrow in on a problem.
Install steps should also try to address any common pitfalls – particularly how different operating systems might require different steps. You may consider having separate sections or pages to describe install steps on different operating systems.
What dependencies does installing your tool require? Will these be installed automatically by the steps you describe or does your user need to install other software before being able to install your tool? This can be a big roadblock to users if dependencies and how to install them are not addressed.
To recap:
Installation steps can be tricky – and admittedly hard to give guidance on when individual computer’ set ups can differ so much, but the more you are able to workshop your guidance to your users here, the more they will appreciate it and stick with your tool!
Your getting started section should give your users the basic concepts they need for running your tool – a knowledge foundation that they can build upon as they continue to explore and follow your How-to Examples.
Installation steps are not fun so the later part of your getting started section should lead your user into a quick tutorial that will reward your user for making it through the hard part!
Give your users the fewest steps needed to produce a rewarding result that will excite them about continuing to use your tool! Use this opportunity to show off your the simplicity and awesomeness of your tool!
This rewarding result might be a cool visual or a plot – but also should demonstrate the most popular thing your users would like to see.
Now that your user has successfully installed your tool and understands the basic idea, let them know where they can find more examples to keep the learning train going!
Snakemake has a great getting started section (Molder et al. 2021). The makers of Snakemake tell their users how to install Snakemake using different situations and keeping dependencies in mind, right after which they have a short tutorial to entice their users!
GSEA introduces their users to multiple options of how they can run the tool and nicely use reassuring screenshots throughout to let their users know if they are on the right track (Subramanian et al. 2005)!
TODO: Update instructions here.
Use the template getting started document to start your own getting started section either by using the markdown template directly, or navigating to the MkDocs repository you set up in the previous chapter.
While getting started sections are geared toward brand-new users, how-to examples are geared toward intermediate users that have successfully installed your tool and now want to know more about what they can do with it. How-to examples can turn these moderately interested users into enthusiastic and invested fans of your tool!
How-to examples are like recipes in a cookbook. We can generally assume your user has found the kitchen, now give them various sets of steps to create something awesome!
Our goal for our how-to examples is to show off the best and most exciting use cases of your tool!
Note: In some contexts, like Bioconductor, how-to examples are called vignettes – we consider these to be the same.
Users won’t go through examples that demonstrate analyses they aren’t interested in. So although you may have your favorite pet functions of your tool, it doesn’t necessarily mean that those are the prime interests of your user (though perhaps after you create a good set of examples you can return to make examples of your favorites). You may want to do some asking around, or conduct some user research to find out what your users are most interested in. You also are not restricted to just one example, users love having a full library of how-to examples to choose from! That being said, for your own time and planning, you may want to start with the most common use cases to create examples for and then you can move to more fringe cases.
Examples should explicitly give every step needed to reproduce your result. Keep a special look out for steps you complete that you may take for granted that your user would know. For example, if your user needs to change to a specific directory to run a command, don’t assume that they will know that for sure.
For command line based tools, provided the exact code your user needs to run. Ideally this example can be provided as a notebook or script so your user can run it directly.
In the case of a GUI, provide screenshots or a video tutorial.
Getting data formatted correctly is another huge hurdle of users and although you should give guidance on how data should be formatted for your tool, your examples should not depend on your user’s data. Instead, provide your users with example data that your example code directly downloads (or is available through your GUI). This has the added benefit of being a positive control for when users are troubleshooting the formatting for their own data later on, but doesn’t force them to face that battle before they can follow your example.
Make sure your example adequately introduces the example: what are the measurements from, what was the goal of this dataset. And of course, link to the source of the data and cite it!
Example code is not the same as backend code. Although example code should also be functional and work, its primarily meant to teach, so even more so than usual code in examples should always prioritize clarity over cleverness or even brevity.
This means your examples should include the most easily readable code you can muster – sometimes this means extra workshopping to reach peak clarity. Give commentary at each and every step – don’t assume your users understand your typical conventions. Also in the interest of being as readable as possible, try to stick to a styling conventions – s p a c i n g matters!
Related to this, your examples’ code should model best practices
Pretend you are the model user of your tool – how should your users interact with your tool? This means keeping in mind the important basics:
Your user already made it through the installation process, try not to make them add more installation steps to follow your examples unless absolutely necessary. If it is absolutely necessary, should this package be something added as a dependency – can you have it automatically installed for the user if it is critical to common uses?
If your users follow along with your examples successfully, their next step will probably be to tailor your examples to their own questions. Whether you intend it or not, your examples will probably be used as template framework for your user’s analysis. Knowing this, try to highlight places that users will absolutely need to change the code and other places that they might want to personalize it. Providing them with more resources about options and possibilities is always nice too.
To recap:
DESeq2 has excellent vignettes! Love, Huber, and Anders (2014) walk through the most common use cases of DESeq2, providing data and explaining the set up. They efficiently move into other scenarios, explaining common questions and areas of nuance along the way.
QIIME2 also has an extensive set examples! Bolyen et al. (2019) give a great set up and hypothesis to a question with a given dataset and walk through each step to answer that question. At the very end of the example they also provide the end result for comparison!
Use the how-to example templates to start your own how-to examples either by using the markdown template directly, or navigating to the MkDocs repository you set up in the previous chapter.
If your tool’s destination is Bioconductor or Galaxy, see our specific guidance on those repositories’ examples:
For Bioconductor vignettes:
For Galaxy vignettes:
Reference guides are the dictionary of your tool: they aren’t meant to be read front to back, but the best ones are easily searchable. Your user will have something in mind that they are trying to find information on – the quicker they can find it, the quicker their question can be answered.
Our goal for a reference guide is to be as comprehensive, navigable, and as always, as clear as possible.
As with our other documentation sections, no matter how well they are crafted, they are no use if no one can find them. Make sure that a link to the reference guide is clear to find.
Users will be digging into your reference guide looking for a specific entry. Reference guides being alphabetical is a start. If you are able to make terms searchable that’s even better, but at the very least, if your reference guide is visually easily able to be scanned, that can serve a similar function.
All items are covered in the reference guide – every single thing. This includes all:
For Bioconductor packages, there’s specific guidance on manuals.
Perhaps after installation, getting data formatted correctly is one of the other very large hurdles users will need to deal with.
Ideally, your tool can use a data format that is common. But the more that your tool is particular about an odd data format, the more your documentation needs to be specific about what the odd data format looks like.
GSEA has great descriptions of their data formats with examples of what the data formats look like.
Consistency helps your users know what to expect and know where to find information! Each entry in the reference guide should have the same format and sections, in the same order.
To recap:
QIIME2 has a great reference guide! Bolyen et al. (2019) cover all items and terms with lots of links to more information or related entries.
Bioconductor packages have a consistent reference guide format that the packages there adhere to which makes it easier for users to find once they are familiar with the format. A typical package’s reference guide looks like this.
For R package documentation: Follow the advice from Hadley Wickham from the R Packages book which includes using roxygen2 package to automatically render those .Rd files!
For Python package documentation: Follow the docstrings guidance and instructions here.
If your tool’s destination is Bioconductor, see their specific guidance on manual pages.
For other general purposes, you can our the reference guide template to start your own reference guide either by using the markdown template directly, or navigating to the MkDocs repository you set up in the previous chapter.
Code documentation goes beyond your user. It’s a part of writing good code and helps your collaborators and future you!
Most users will probably not look at your code directly – those who do are probably:
Plenty of people have discussed good code comments at length so we’ll refer to those discussions here:
Some major points from these articles:
How do you know if your code is working? You test it and get feedback! Similarly, how do you know if your tool is working for your user? Ask them for feedback!
At the most basic level, you need to provide your users a way to alert you if something with your tool is not working appropriately.
It may sound disappointing that a user has found a problem but this is something to be happy about!
Providing a method of contact to your users doesn’t mean you need to give users your personal email. In fact, that is probably not the most practical way to keep user queries organized.
Example contact method ideas:
Whatever method of contact you provide your users, just make sure its something that works for you and your team to respond to.
In whatever contact method you settle on, give your users a way to indicate if they are willing to chat with you or do testing to provide even more feedback.
Depending on your time and resources you can do a lot with usability testing. This excellent article by Csontos (2019) takes us through how to conduct usability testing which we will echo the main points of here.
What to use usability testing for:
- Identifying main issues in the usability of a product
- Checking if users understand the steps to carry out a task and the navigation
- Observing how easily and quickly they accomplish tasks
- Validating the value proposition of an app or website – do your potential customers understand it?
To summarize the steps laid out by Csontos (2019) for conducting usability research:
Step 1) Plan out your study - clarify what you want to learn and write a plan and script.
See more advice about writing usability testing scripts here:
Step 2) Find user participants.
Who is interested in available for being your test subjects? You might be able recruit people from setting up a feedback form, but you also may use word of mouth and ask around.
A very important point from Csontos (2019):
You can use these prompts from Krug (2010) to help you know what to say to help create a non-judgmental atmosphere for testing.
Step 4) Analyze & Report Sit down with your notes and recordings, look for patterns.
More reading on usability testing:.
Set up a method of contact for your users. We have a mock feedback user form set up here. This form shouldn’t be used as is, but could be tailored to more specifically ask questions about your tool that you are looking to get information about.
Go through All You Need to Know to Run Successful Usability Testing and construct a plan for usability testing following the steps we discussed above from the article. Think about what in your tool you want to test and write a plan for it.
For all cartoons:
Avataars by https://getavataaars.com/.
Icons by https://thenounproject.com/ License CC BY-NC-ND 2.0.
Emojis by OpenMoji License: CC BY-SA 4.0.↩︎
For all cartoons:
Avataars by https://getavataaars.com/.
Icons by https://thenounproject.com/ License CC BY-NC-ND 2.0.
Emojis by OpenMoji License: CC BY-SA 4.0.↩︎
For all cartoons:
Avataars by https://getavataaars.com/.
Icons by https://thenounproject.com/ License CC BY-NC-ND 2.0.
Emojis by OpenMoji License: CC BY-SA 4.0.↩︎